Automatic Style Categorisation of Corpora in the Greek Language

نویسندگان

  • George Tambouratzis
  • Stella Markantonatou
  • Nikolaos Hairetakis
  • George Carayannis
چکیده

In this article, a system is proposed for the automatic style categorisation of text corpora in the Greek language. This categorisation is based to a large extent on the type of language used in the text, for example whether the language used is representative of formal Greek or not. To arrive to this categorisation, the highly inflectional nature of the Greek language is exploited. For each text, a vector of both structural and morphological characteristics is assembled. Categorisation is achieved by comparing this vector to given archetypes using a statistical-based method. Experimental results reported in this article indicate an accuracy exceeding 98% in the categorisation of a corpus of texts spanning different registers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Leadership Style and Job Satisfaction of Greek Banking Institutions

The aim of the present study is to investigate the relationship between Greek Banks’ leadership style and their employee’s satisfaction with their job. Leadership is considered to be an integral part of an organization’s structure, policies and strategies, and overall function. Therefore, it affects employees’ everyday life, interpersonal relationships, problem-solving strategies and internal f...

متن کامل

The Discursive Construction of Ethnic Identities: The Case of Greek-Cypriot Students

This study examines how Greek-Cypriot students aged 12 to 18, an understudied group of students, construct their ethnic identity in a complex setting such as Cyprus and what motivates the students in the selection of ethnic identity labels. The choice to focus on students aged 12-18 was made on the hypothesis that young children, who did not experience the 1974 war in Cyprus, may have a differe...

متن کامل

Metadiscourse Use in Popular and Professional Science: The Case of Hedges and Boosters

The present article shows that all scientific texts included in journals, magazines, and newspapers are vulnerable to the penetration of hedges and boosters.  However, it was found that scientific texts in the three corpora tended to open up the possibilities of alternative voices rather than narrowing them down. The relatively higher frequency of occurrence of hedges in comparison with booster...

متن کامل

Improving a Catalan-Spanish Statistical Translation System using Morphosyntactic Knowledge

In this paper, a human evaluation of a Catalan-Spanish Ngram-based statistical machine translation system is used to develop specific techniques based on the use of grammatical categories, lexical categorisation and text processing, for the enhancement of the final translation. The system is successfully improved when testing with ad hoc and general corpora, as it is shown in the final automati...

متن کامل

Cross-Domain and Cross-Language Porting of Shallow Parsing

English was the main focus of attention of the Natural Language Processing (NLP) community for years. As a result, there are significantly more annotated linguistic resources in English than in any other language. Consequently, data-driven tools for automatic text or speech processing are developed mainly for English. Developing similar corpora and tools for other languages is an important issu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000